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(54) Automated stabilization method for digital image sequences 



(57) A method and structure for stabilizing a motion 
image formed using a sequence of successive frames 
which includes calculating a motion vector field between 
adjacent frames; forming a motion vector histogram 
from horizontal and vertical components of the motion 
vector field; applying a threshold to the motion vector 
histogram to produce a thresholded motion vector his- 



togram; generating average horizontal and vertical mo- 
tion components from the thresholded motion vector 
histogram; filtering the average horizontal and vertical 
motion components over a number of frames to identify 
unwanted horizontal and vertical motion components for 
each of the frames; and stabilizing the image sequence 
by shifting each frame according to the corresponding 
unwanted horizontal and vertical motion components. 





EP1 117 251 A1 



Description 

FIELD OF THE INVENTION 

5 [0001] The present invention generally relates to digital image processing and more particularly to a system and 
method for removing unwanted motion in digital image sequence. 

BACKGROUND OF THE INVENTION 

w [0002] Digital image sequences, such as those obtained from digital video cameras or from the scanning of motion 
picture film, often contain unwanted motion between successive frames in the sequence. There are many potential 
causes of this unwanted motion, including camera shake at the time of image capture, or frame-to-frame positioning 
errors (also known as jitter or hop and weave) when a film sequence is scanned. The process of removing this unwanted 
motion is termed image stabilization. 

15 [0003] Some systems use optical, mechanical, or other physical means to correct for the unwanted motion at the 
time of capture of scanning. However, these systems are often complex and expensive, and they cannot correct for 
unwanted motion in a digital image sequence that was produced by an unknown and uncharacterized device. To provide 
stabilization for a generic digital image sequence, several digital image processing methods have been developed and 
described in the prior art. 

20 [0004] A number of digital image processing methods use a specific camera motion model to estimate one or more 
parameters such as zoom, translation, rotation, etc. between successive frames in the sequences. These parameters 
are computed from a motion vector field that describes the correspondence between image points in two successive 
frames. The resulting parameters can then be filtered over a number of frames to provide smooth motion. An examples 
of such a system can be found in U.S. patent 5,629,988 to Burt et al. A fundamental assumption in these systems is 

25 that a global transformation dominates the motion between adjacent frames. In the presence of significant local motion, 
such as multiple objects moving with independent motion trajectories, these methods may fail due to the computation 
of erroneous global motion parameters. 

[0005] Other image processing methods for digital image stabilization are designed primarily for digital video camera 
applications where system constraints include minimal buffering requirements and near real-time processing. As a 

so result, these methods are limited to applying unwanted motion correction between only two frames at a given time, 
which prohibits filtering of the motion parameters over multiple frames. An example of such a system is described in 
U.S. patent 5,748,231 to Park et al. In this method, a weighted average motion vector is computed from the motion 
vector field corresponding to two successive frames. The weightings are determined from various statistical measures 
that indicate the reliability of a given motion vector. The weighted average motion vector is then applied to remove the 

35 motion between two successive frames. This type of processing results in all motion (including desired camera motion 
such as pans) being removed from the sequence, not just unwanted motion. Again, these methods assume the image 
sequences contain a dominant global motion, and they may fail in the presence of significant local motion. 
[0006] Still other digital image processing methods for removing unwanted motion make use of a technique known 
as phase correlation for precisely aligning successive frames. An example of such a method has been reported by 

40 Eroglu et al. ("A fast algorithm for subpixel accuracy image stabilization for Digital Film and Video," in Proc. SPIE Visual 
Communications and Image Processing, Vol. 3309, pp. 786-797, 1998). However, these methods require that the 
sequence has no local motion, or alternatively, a user must select a region in consecutive frames that has no local 
motion. The dependence upon areas with no local motion and the necessity for user intervention are major drawbacks 
of these methods. 

45 

Problem to Be Solved by the Invention 

[0007] The invention solves the problem of removing unwanted motion from a digital image sequence without re- 
moving desired motion (e.g., pan, zoom, etc.). It does so without excessive computational requirements, and it is a 
so fully automated process. Furthermore, it is robust in the presence of significant local motion in the image sequence. 

SUMMARY OF THE INVENTION 

[0008] The present invention overcomes the limitations of conventional systems by using a simple model that is 
55 based on the observation that the cumulative motion vectors corresponding to the desired motion will generally vary 
smoothly from frame-to-frame. Further, the invention uses a motion vector histogram with a simple threshold, and does 
not rely on a specific camera transformation model (or the absence of local motion). It is, therefore, an object of the 
present invention to provide a structure and method that uses a motion vector histogram in determining the unwanted 
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motion components in digital image sequence. 

[0009] One embodiment of the invention is a method for stabilizing a digital image sequence consisting of a number 
of successive frames. The method includes: calculating a motion vector field between adjacent frames; forming a 
motion vector histogram from horizontal and vertical components of the motion vector field; applying a threshold to the 

5 motion vector histogram to produce a thresholded motion vector histogram; generating average horizontal and vertical 
motion components from the thresholded motion vector histogram; filtering the average horizontal and vertical motion 
components over a number of frames to identify unwanted horizontal and vertical motion components for each of the 
frames; and stabilizing the image sequence by shifting each frame according to the corresponding unwanted horizontal 
and vertical motion components. 

10 [0010] The thresholding of the motion vector histogram removes undesirable motion vectors that are likely to be 
unreliable or correspond to objects that have a small spatial extent. This threshold can be changed for each frame by 
an adaptive means or can be fixed at a pre-specified level. The unwanted horizontal and vertical components corre- 
spond to high temporal frequencies, and they can be computed by applying a highpass filter to the average horizontal 
and vertical components. The degree of highpass filtering is user-adjustable. The stabilizing includes a displacement 

15 of each frame by the corresponding unwanted horizontal and vertical components. 

[0011] Another embodiment of the invention is a computerized digital imaging system for stabilizing a digital image 
sequence formed from a number of successive frames including: a motion estimation unit for calculating a motion 
vector field between adjacent frames; a histogram generator unit for forming a motion vector histogram from horizontal 
and vertical components of the motion vector field; a thresholding unit for applying a threshold to the motion vector 

20 histogram to produce a thresholded motion vector histogram: and averaging unit for generating average horizontal and 
vertical components from the thresholded motion vector histogram; a filtering unit for filtering the average horizontal 
and vertical components over a number of frames to identify unwanted horizontal and vertical components for each of 
the frames, and a stabilizing unit for stabilizing the image sequence by translating each frame according to the corre- 
sponding unwanted horizontal and vertical components. The thresholding unit removes motion vectors that are likely 

25 to be unreliable or correspond to objects that are temporally transient or have a small spatial extent. The system further 
includes a threshold determination unit for adaptively computing a threshold for each frame, or alternatively, the thresh- 
old may be fixed at a pre-specified level. The filter includes a highpass filter for identifying the unwanted horizontal and 
vertical components. The system further includes a user interface for adjusting the degree of highpass filtering. The 
stabilizing unit includes a displacement unit for shifting each frame by the corresponding unwanted horizontal and 

30 vertical components. 

ADVANTAGES OF THE INVENTION 

[0012] One advantage produced by the invention is that unwanted global motion is removed from a digital image 
35 sequence in a fully automated operation. With the invention, no user intervention is required, although the user has 
access to some system parameters to control usage and degree of stabilization. Also, the invention has robust per- 
formance to different scene content. Thus, the sequence may contain substantial local motion without significantly 
affecting the removal of the unwanted global motion. Moreover the desired camera motion (pan, zoom, etc.) is not 
removed during the inventive stabilization process. 
40 [0013] Further, with the invention, there is a minimal computational load. The estimation of the motion vectorfield is 
the most time-consuming component. However, this can be done efficiently with block-based motion estimation meth- 
ods. Additionally, the invention allows stabilization to sub-pixel levels (e.g., below human perceptual thresholds). Finally, 
the use of the motion vector histogram offers the potential for further improvements, including the tracking of multiple 
motion vector clusters for improved estimates of unwanted motion. 

45 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0014] The foregoing and other objects, aspects and advantages will be better understood from the following detailed 
description of preferred embodiments of the invention with reference to the drawings, in which: 

50 

FIG. 1 is a schematic diagram illustrating an automated image stabilization method/system according to the in- 
vention; 

FIG. 2 is a diagram illustrating a two-dimensional (2-D) motion vector histogram according to the invention; 
FIG. 3 is a graph illustrating unwanted motion according to the invention; and 
55 FIG. 4 is a graph illustrating the removal of unwanted motion according to the invention; and 

FIG. 5 is a hardware embodiment of a computer system for operating the invention. 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION 

[0015] The invention removes unwanted global motion in a digital image sequence. Unwanted motion adversely 
impacts overall image quality and potentially reduces performance of current compression systems such as MPEG-2. 

5 The present invention is especially useful with film-originated source material (i.e., motion pictures that have been 
scanned into digital form) containing unwanted jitter (also known as hop/weave), which arises from various physical 
components in the imaging chain (e.g., perforation nonuniformities, media instability, mechanical aspects of capture/ 
duplication/projection devices, etc.). However, as would be known by one ordinarily skilled in the art given this disclo- 
sure, the same method could also be used to stabilize unwanted camera motion in video-originated sequences. 

10 [0016] As shown in Figure 1, each frame from the original unstabilized sequence 100 is first processed with the 
preceding frame using amotion estimation unit 102 to form a motion vector field MVF(t) that indicates correspondence 
points between the current frame at a given time f and the preceding frame at time f-1 . As is known by one ordinarily 
skilled in the art, the computation of a motion vector field is only valid for successive frames that share at least some 
similarity. If the sequence consists of multiple scenes (i.e., frames with significantly different image content), it is first 

15 necessary to segment the sequence into individual scenes (whether each scene has a continuity of image content) 
priorto the computation of motion vectors. This process of segmenting a sequence into individual scenes is sometimes 
known as scene change detection, and there have been numerous descriptions of such conventional methods. Alter- 
natively, manual methods for scene detection can be used, where a person marks each scene change after viewing 
the sequence. Scene change detection is not a large part of the present invention, and it is assumed that such process- 

20 ing has been done prior to the application of the inventive image stabilization. 

[0017] The motion vector field can be formed using a variety of methods, but in a preferred embodiment, the vector 
field is formed using block-based motion estimation for efficient computations. As is well known to those ordinarily 
skilled in the art, block-based motion estimation involves forming bx b blocks of pixels from the current frame, and for 
each block, finding abxb block in the preceding frame that is the best match according to some error metric. A typical 

25 value for b is 8 or 1 6 pixels, and mean-squared error (MSE) or mean absolute distortion (MAD) is often used as the 
error metric. The output of the block matching process is a n x m set of motion vectors, which comprise the motion 
vector field. Increasing n and/or m (i.e., increasing the density of the motion vector field) may lead to improved per- 
formance of the current invention, but the drawback is that the number of computations is increased. The invention 
performs adequately with moderate values for n and m. As would also be readily known by those ordinarily skilled in 

30 the art given this disclosure, other motion estimation methods, such as optical flow, could certainly be used and may 
provide improved performance at the expense of increased computations. As each frame is sent to the motion esti- 
mation unit 102, it is also sent to a buffer 101, which stores N frames in order to allow the unwanted motion to be 
removed from the original sequence during a subsequent stabilization stage 1 07. 

[0018] The motion vector field from each pair of adjacent frames is then processed with a histogram generator unit 
35 1 03 to produce a total 2-D motion vector histogram, H tota[ (x,y,t} of horizontal (x) and vertical (y) translation values at a 
given time f. The histogram generator unit 1 03 computes the histogram by summing the number of occurrences of a 
particular (x,y) translation pair over the motion vectorfield. An example of a motion vector histogram is shown in Figure 
2. In this figure, the x and y translation values can each vary from -15 pixels to +15 pixels. 

[0019] As illustrated in Figure 2, a 2-D histogram of the motion vector components will typically contain one or more 
40 clusters 200 with a significant number of counts, where the dusters indicate regions within the image that are moving 
with the same velocity and direction. There will also be numerous clusters 201 continuing only a few counts, which 
indicates either small spatial regions or, more commonly, indicate inaccurate motion vectors (which results from noise, 
ambiguous signal content in the image, etc.). 

[0020] Since these motion vectors clusters 201 are likely to be misleading in subsequent processing, they are pref- 
45 erably discarded. As would be known by one ordinarily skilled in the art given this disclosure, any number of techniques 
can be used to promote the removal of the clusters 201 having only a few counts. In a preferred embodiment, the 
removal is performed by applying a simple threshold filter to the histogram count values, i.e., retaining only those 
histogram values that equal or exceed a threshold T. Referring to Figure 1 , this thresholding process is performed by 
a thresholding unit 1 04. The output of the thresholding unit is a thresholded motion vector histogram Hth re sh( x ^0- 
so [0021] Besides thresholding, other methods can also be introduced to reduce the number of unreliable motion vectors 
including discarding vectors based on: 1) the variance of a block (where low variance blocks are essentially matching 
noise, not features), 2) the summed error between corresponding points in matched blocks (where high error indicates 
the blocks are not similar), and 3) the number of blocks that match with equivalent error (where there is no way to 
distinguish which matched block is the proper one). All of these methods can be applied in addition to the histogram 
55 thresholding, but are not necessary to the proper functioning of the current invention. 

[0022] The present invention is not particularly sensitive to any specific histogram threshold value, and the threshold 
Tcan be set to a fixed value for all frames in a sequence. However, as would be known by one ordinarily skilled in the 
art given this disclosure, setting the threshold too high may result in only a few vectors being retained, which can lead 
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to unstable behavior. Likewise, setting threshold too low may result in the inclusion of inaccurate motion vectors. To 
reduce the potential for poor performance caused by an inappropriate threshold, it is possible to adapt the threshold 
for each frame. To determine this adaptive threshold, the invention models the total motion vector histogram as the 
sum of two components: 

"totals/. 0 = "reliable**^ f ) + H unreliable^) ( 1 ) 

where H renMe (x,y,() is the histogram component corresponding to reliable motion vectors at time f, and H U nreiiabie( x >y> 
f) is the histogram component corresponding to unreliable motion vectors at time f. 

[0023] Modeling these histogram components as random variables, it is easy to derive a relationship for the means 
of the histograms: 



where Utotai is tne mean of H totai(*>y>0, tenable is tne mean of H reiiabie(*>yO, and n unreHab | e is the mean of H unreliable (x, 
y,7). Because the histogram values are always positive, the mean of the total histogram is always greater than the mean 
of the histogram for the unreliable vectors. By selecting the threshold to be the mean of the total histogram values (i. 
e., T= (i tota |), it is generally assured that most of the unreliable vectors will be below the threshold and hence will not 
be included in subsequent calculations. It may also be desirable to elevate this threshold by some amount to reduce 
further the possibility of including unreliable vectors. One approach is to use the weighted sum of the total histogram 
average and the peak value of the histogram, P. 

7- = a^ Total + (1-a)*P (3) 

where a is a number between 0 and 1 . If a = 0, the threshold is the peak value of the histogram, which corresponds to 
retaining only the mode of the histogram. 

[0024] In the present invention, it may also be advantageous to process the histogram priorto computing the threshold 
and/or applying the threshold to the histogram, Such preprocessing can include smoothing the histogram with simple 
linearfilters. This smoothing process reduces local variations in the histogram which can lead to more robust exclusion 
of unreliable motion vectors, while still retaining reliable motion vectors. 

[0025] Another aspect of the invention is a histogram processing method that computers separate one-dimensional 
(1 -D) motion vector histograms for the x and y translation components. The motivation for this processing is that, in 
certain image regions, the xcomponent may be accurate while the ycomponent is inaccurate, or vice versa. By forming 
1-D histograms and computing individual thresholds, T x and T y , for the horizontal and vertical components, respectively, 
it is possible that more reliable vectors for both the x and y components may be retained. 

[0026] Referring again to Figure 1 , the thresholded motion vector histograms is then sent to an averaging unit 1 05 
to compute average horizontal and vertical translation vectors for each of the frames. After thresholding to eliminate 
the unreliable vectors, the following relationship exists: 

"thresh (*y<) = "reliable ( 4 ) 

That is, the thresholded motion vector histogram contains primarily reliable motion vectors. 

[0027] The average horizontal and vertical translation vectors at a given time f are then computed as: 



where x(f) is average horizontal translation vector and y(f) is the average vertical translation vector at time f. The 
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invention uses simple averages of the histogram values because adjacent frames in sequence (assuming they come 
from the same scene) will contain roughly the same object undergoing smoothly varying local motion. This is expected 
as physical objects have mass, and thus cannot change speed or direction instantaneously. Regardless of the number 
and extent of moving objects, it is expected that the cumulative motion (i.e., the average horizontal and vertical motion 

5 vectors) will be smoothly varying from frame-to-frame. Further, besides local motion, there is also global motion such 
as desired camera motion (pan. zoom, etc.) and unwanted motion (jitter, camera shake, etc.). The desired camera 
motion is also typically smooth (particularly in professional cinema), but the unwanted motion will be rapidly and ran- 
domly varying. As a result, the average motion of all objects in the scene will tend to vary smoothly from frame-to- 
frame, while the global motion components introduced by the unwanted motion of jitter, camera shake, etc. generally 

10 will not vary smoothly. 

[0028] Because the unwanted motion components vary more rapidly than the desired motion, the invention filters 
the average horizontal and vertical motion vectors over a number of frames to extract unwanted motion, using a tem- 
poral filtering unit 1 06 in Figure 1 . As explained above, the average horizontal and vertical translations should change 
gradually from frame to frame in the case of the desired motion. Therefore, any motion that deviates from a smoothly 

15 varying trend will be identified as unwanted motion. For example, and shown in Figure 3, the average horizontal or 
vertical component is plotted against time for frames 1-6. As shown, frame 4 deviates from the smooth progression of 
frames 1-3, 5 and 6. Therefore, frame 4 is identified as having unwanted motion and thus requires stabilization. 
[0029] To compute the unwanted motion that must be removed from frame 4, standard filtering techniques can be 
applied to the time series formed from the average translation vectors. The unwanted motion is assumed to consist of 

20 high frequency components, and highpass filtering can be used to directly extract this unwanted motion. Alternatively, 
lowpass filtering can be used to extract the desired low frequency motion, and the resulting smoothed translation 
vectors can then be subtracted from the average motion vectors to compute the unwanted high frequency components. 
These two approaches for extracting the unwanted motion are entirely equivalent. The desired motion and correspond- 
ing unwanted motion components of frame 4 are shown in Fig. 4. This unwanted motion component can be used to 

25 determine the amount of horizontal and/or vertical stabilization required to correct frame 4. 

[0030] In a preferred embodiment to reduce buffering requirements, a simple FIR (finite impulse response) highpass 
filter is used to extract the unwanted high frequency motion from the item series of average horizontal and vertical 
translations. The length of the filter, N, which corresponds to number of frames that must be buffered, has a significant 
impact on the removal of unwanted frequency components from the global motion estimate. A tradeoff occurs between 

30 the lowest frequency that can be removed and the number of frames that must be buffered. A user is preferably given 
the option of having direct control over the degree of filtering in both the horizontal and vertical directions, and the 
impact of removing certain components of the unwanted motion can be viewed interactively. Besides the length of the 
filter, a user can also be given control over the maximum rate-of-change of the desired motion, thus providing an 
adjustment of the slew rate of the desired motion. 

35 [0031] The user of the motion vector histogram in extracting unwanted motion also provides another benefit that is 
not readily apparent. As mentioned previously, the clusters 200 in Figure 2 represent regions in an image that are 
moving with the same velocity and directions. It is possible to track each of these motion vector clusters as separate 
entities over successive frames, thereby producing multiple estimates for the unwanted motion. These multiple esti- 
mates can then be combined in a variety of ways to produce a single estimate forthe unwanted motion. For example, 

40 a weighted sum of the individual estimates could be formed, where the weights are determined by the degree of con- 
fidence in each individual estimate. 

[0032] Now, the output of the temporal filtering unit 106 in Figure 1 is an estimate of the unwanted global motion in 
both the horizontal and vertical directions at a given time f. The unwanted horizontal motion is denoted as X u (f) and 
the unwanted vertical motion is denoted as y u (f). Given this estimate, the corresponding input frame for time f is sta- 

45 bilized relative to the reference frame (which is typically the first frame in the sequence) using a stabilizing unit 107. 
The stabilization process is done through simple horizontal and vertical displacements of the current frame. The esti- 
mates forthe unwanted motion will typically be non-integer values, which means a standard image interpolation method 
(such as bilinear or cubic interpolation) is required to allow sub-pixel displacements. Sub-pixel displacements are also 
required to put any unwanted, but uncompensated, motion to levels below the human perceptual thresholds. 

so [0033] While the overall methodology of the invention is described above, the invention can be embodied in any 
number of different types of systems and executed in any numberof different ways, as would be known by one ordinarily 
skilled in the art. For example, as illustrated in Figure 5, a typical hardware configuration of an information handling/ 
computer system in accordance with the invention preferably has at least one processor or central processing unit 
(CPU) 400. The CPU 400 is interconnected via a system bus 401 to a random access memory (RAM) 402, read-only 

55 memory (ROM) 403, input/output (I/O) adapter 404 (for connecting peripheral devices such as disk units 405 and tape 
drives 406 to the bus 401), communication adapter 407 (for connecting an information handling system to a data 
processing network) user interface adapter 408 (for connecting peripheral 409, 410, 411 such as a keyboard, mouse, 
digital image input unit, microphone speaker and/or other user interface device to the bus 401), a printer 412, and a 
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display adapter 413 (for connecting the bus 401 to a display device 414). The invention could be implemented using 
the structure shown in Figure 5 by including the inventive method within a computer program stored on the storage 
device 405. Such a computer program would act on an unstabilized time series of image frames supplied through the 
interface units 409, 410, 411 or through the network connection 407. The system would then automatically produce 
5 the desired stabilized digital image frame series output on the display 414, through the printer 412 or back to the 
network 407. 

[0034] The invention removes jitterfrom film-originated sequences as part of the image processing that is performed 
during, for example, the telecine acquisition process. However, as would be known by one ordinarily skilled in the art 
given this disclosure, the invention is not limited to such an application and could be applied to video-originated se- 

10 quences to remove unwanted translational camera motion. Moreover, the invention operates in a fully automated man- 
ner with no user invention, although the user may be given control over some system parameters to control the degree 
of stabilization. The invention has relatively low complexity and could operate in real-time on an image sequence given 
sufficient computer hardware. Finally, the user of a motion vector histogram to extract unwanted motion provides the 
opportunity to track multiple motion vector clusters, which can result in improved estimates for the unwanted motion 

15 as compared to computing a simple average from all motion vectors. 



Claims 

20 1 . A method for stabilizing an image sequence formed from a plurality of successive frames comprising: 

calculating motion vectors for each of said frames; 
forming a motion vector histogram for each of said frames: 

generating average horizontal and vertical components from said motion vector histogram for each of said 
25 frames; 

identifying unwanted horizontal and vertical components for each of said frames based on said average hor- 
izontal and vertical components; and 

stabilizing said image sequence by translating each frame by said unwanted horizontal and vertical compo- 
nents. 

30 

2. The method in claim 1 , wherein said motion vectors follow corresponding points between adjacent frames. 

3. The method in claim 1 , wherein said motion vector histogram comprises horizontal and vertical components of 
said motion vectors. 

4. The method in claim 1 , wherein said identifying includes filtering said average horizontal and vertical components 
from two or more successive frames to identify unwanted horizontal and vertical components for each of said 
frames. 

40 5. The method in claim 1 , further comprising removing unreliable motion vectors from said motion vector histogram 
to form a processed motion vector histogram for each of said frames. 

6. The method in claim 4, wherein said filtering comprises highpass filtering. 

45 7. The method in claim 4, wherein said filtering is user-adjustable. 

8. The method in claim 5, wherein said removing includes discarding values from said motion vector histogram below 
a threshold. 

so 9. The method in claim 8, wherein said threshold is user-adjustable. 

10. The method in claim 8, wherein said threshold is determined adaptively for each of said frames. 



55 
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